18 research outputs found

    GTH-UPM system for search on speech evaluation

    Get PDF
    This paper describes the GTH-UPM system for the Albayzin 2014 Search on Speech Evaluation. Teh evaluation task consists of searching a list of terms/queries in audio files. The GTH-UPM system we are presenting is based on a LVCSR (Large Vocabulary Continuous Speech Recognition) system. We have used MAVIR corpus and the Spanish partition of the EPPS (European Parliament Plenary Sessions) database for training both acoustic and language models. The main effort has been focused on lexicon preparation and text selection for the language model construction. The system makes use of different lexicon and language models depending on the task that is performed. For the best configuration of the system on the development set, we have obtained a FOM of 75.27 for the deyword spotting task

    Comparación de métodos de caracterización de señales MER

    Get PDF
    En este documento presenta una comparación de los métodos propuestos para la caracterización de señales provenientes de microeléctrodos de registro (MER) para la identificación de zonas cerebrales que intervienen en la cirugía de la enfermedad de Parkinson. Los mejores porcentajes de acierto se obtienen utilizando como método de caracterización la transformada wavelet, 97.37% y 71.4% para 2 y 4 clases respectivamente.This document presents a microelectrode registers feature extraction methodologies comparison for brain zones identification found in Parkinson¿s disease surgery. Best results are obtained using wavelet transforms, 97.37% and 71.4% for 2 and 4 classes, respectively

    Spanish generation from Spanish Sign Language using a phrase-based translation system

    Get PDF
    This paper describes the development of a Spoken Spanish generator from Spanish Sign Language (LSE – Lengua de Signos Española) in a specific domain: the renewal of Identity Document and Driver’s license. The system is composed of three modules. The first one is an interface where a deaf person can specify a sign sequence in sign-writing. The second one is a language translator for converting the sign sequence into a word sequence. Finally, the last module is a text to speech converter. Also, the paper describes the generation of a parallel corpus for the system development composed of more than 4,000 Spanish sentences and their LSE translations in the application domain. The paper is focused on the translation module that uses a statistical strategy with a phrase-based translation model, and this paper analyses the effect of the alignment configuration used during the process of word based translation model generation. Finally, the best configuration gives a 3.90% mWER and a 0.9645 BLEU

    New experiments on speaker diarization for unsupervised speaking style voice building for speech synthesis

    Full text link
    El uso universal de síntesis de voz en diferentes aplicaciones requeriría un desarrollo sencillo de las nuevas voces con poca intervención manual. Teniendo en cuenta la cantidad de datos multimedia disponibles en Internet y los medios de comunicación, un objetivo interesante es el desarrollo de herramientas y métodos para construir automáticamente las voces de estilo de varios de ellos. En un trabajo anterior se esbozó una metodología para la construcción de este tipo de herramientas, y se presentaron experimentos preliminares con una base de datos multiestilo. En este artículo investigamos más a fondo esta tarea y proponemos varias mejoras basadas en la selección del número apropiado de hablantes iniciales, el uso o no de filtros de reducción de ruido, el uso de la F0 y el uso de un algoritmo de detección de música. Hemos demostrado que el mejor sistema usando un algoritmo de detección de música disminuye el error de precisión 22,36% relativo para el conjunto de desarrollo y 39,64% relativo para el montaje de ensayo en comparación con el sistema base, sin degradar el factor de mérito. La precisión media para el conjunto de prueba es 90.62% desde 76.18% para los reportajes de 99,93% para los informes meteorológicos

    Extended phone log-likelihood ratio features and acoustic-based I-vectors for language recognition

    Get PDF
    This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg

    Sistema de traducción de lenguaje SMS a castellano

    Get PDF
    En este artículo se describe el proceso llevado a cabo para desarrollar un sistema de traducción de lenguaje SMS (Short Message Service) a castellano. En primer lugar, se genera una base de datos necesaria para desarrollar el sistema, formada por más de 11000 términos y expresiones en lenguaje SMS y sus traducciones al castellano, así como frases de ejemplo en lenguaje SMS para realizar una primera evaluación del sistema. La arquitectura completa está formada por un traductor automático estadístico basado en subfrases o secuencias de palabras y una serie de funciones implementadas para actuar sobre las frases en tiempo real. La evaluación de la arquitectura se realiza con las siguientes métricas: WER (tasa de error de palabras), BLEU (“BiLingual Evaluation Understudy”) y NIST. Como resultado final, se obtiene una tasa de error de palabra de 20,2% para el mejor experimento

    Dynamic topic-based adaptation of language models: a comparison between different approaches

    Full text link
    This paper presents a dynamic LM adaptation based on the topic that has been identified on a speech segment. We use LSA and the given topic labels in the training dataset to obtain and use the topic models. We propose a dynamic language model adaptation to improve the recognition performance in "a two stages" AST system. The final stage makes use of the topic identification with two variants: the first on uses just the most probable topic and the other one depends on the relative distances of the topics that have been identified. We perform the adaptation of the LM as a linear interpolation between a background model and topic-based LM. The interpolation weight id dynamically adapted according to different parameters. The proposed method is evaluated on the Spanish partition of the EPPS speech database. We achieved a relative reduction in WER of 11.13% over the baseline system which uses a single blackground LM

    Selection of TDOA Parameters for MDM Speaker Diarization

    Get PDF
    Several methods to improve multiple distant microphone (MDM) speaker diarization based on Time Delay of Arrival (TDOA) features are evaluated in this paper. All of them avoid the use of a single reference channel to calculate the TDOA values and, based on different criteria, select among all possible pairs of microphones a set of pairs that will be used to estimate the TDOA's. The evaluated methods have been named the "Dynamic Margin" (DM), the "Extreme Regions" (ER), the "Most Common" (MC), the "Cross Correlation" (XCorr) and the "Principle Component Analysis" (PCA). It is shown that all methods improve the baseline results for the development set and four of them improve also the results for the evaluation set. Improvements of 3.49% and 10.77% DER relative are obtained for DM and ER respectively for the test set. The XCorr and PCA methods achieve an improvement of 36.72% and 30.82% DER relative for the test set. Moreover, the computational cost for the XCorr method is 20% less than the baseline

    Herramienta diagnostica en pacientes con TDAH

    Get PDF
    Video sobre el proyecto de Investigación, que contiene las ideas principales del manejo y objetivo del mismo, el video busca dar a conocer la aplicación, así como sus ventajas en el tema tratado
    corecore